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A computer-assisted comparative analysis of the amino acid sequences of (putative) thiol proteases encoded by the genomes of several diverse groups 
of positive-stranded RNA viruses and distantly related to the family of cellular papain-like proteases is presented. A high level of similarity was 
detected between the leader protease of foot-and-mouth-disease virus and the protease of murine hepatitis coronavirus which cleaves the N-terminal 
p28 protein from the polyprotein. Statistically significant alignment of a portion of the rubella virus polyprotein with cellular papain-like proteases 
was obtained, leading to tentative identification of the papain-like protease as the enzyme mediating processing of the non-structural proteins of 
this virus. Specific group:ng between the sequences of the proteases of a-viruses, and poty- and bymoviruses was revealed. It was noted that papain- 
like proteases of positive-stranded RNA viruses are much more variable both in their sequences and in genomic locations than chymotrypsin-related 
proteases found in the same virus class. A rovel conserved domain of unknown function has also been identified which flanks the papain-like pro- 
teases of a-, rubi- and coronaviruses. 


Papain-like protease; RNA virus; Polyprotein processing; Sequence motif; Catalytic center 


1, INTRODUCTION 


Polyprotein processing is the strategy employed by a 
number of groups of positive-stranded RNA viruses for 
genome expression (for review see [1]). Processing of 
membrane proteins of enveloped viruses is usually 
mediated by cellular proteases, whereas processing of 
non-membrane proteins by virus-encoded proteases. A 
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CP2, Dictyostelium discoideum cysteine proteinase | and 2; actin, 
Actinidia chinensis actinidin,; papain and omega, Carica papaya pro- 
teinase I and II], respectively; aleur, Hordewm vulgare aleurain; 
Cat.H and L, Rattus norvegicus cathepsins H and L; Cat.B, Homo 
sapiens cathepsin B; SH-EP, Vigna mungo cysteine endopeptidase; 
bromel, Ananas comosus bromelain; derpt, Dermatophagoides 
pteronyssinus major mite fecal allergen; calp, Mus musculus caipain; 
SFV, Semliki forest virus; SNBV, Sindbis virus; VEEV, Venezuelan 
equine encephalomyelitis virus; ONNV, O’Nyong-Nyong virus; 
RRV, Ross River virus; MidV, Middelburg virus (alphaviruses); 
PVY, potato virus Y; PPV, plum pox virus; TEV, tobacco etch virus; 
TVMV, tobacco vein mottling virus (potyviruses); BaYMV, barley 
yellow mosaic virus (bymovirus); MHV, murine hepatitis virus; IBV, 
avian bronchitis virus (coronaviruses); FMDV A10, foot-and-mouth- 
disease virus Al0 strain (aphthovirus); RuV, rubella virus (rubivirus); 
HC, helper component; M-pro, ‘main’ protease; L-pro, ‘leader’, or 
accessory protease (see text); SPL, ‘Streptococcus-like’ protease; CH, 
cylindrical inclusion (potyvirus protein), 


Published by Elsevier Science Publishers B.V. 


large superfamily of virus-encoded proteases related to 
chymotrypsin-like cellular serine proteases has been 
described [2~5]. Some of these viral proteases have the 
substitution of Cys for the principal catalytic Ser, not 
found in cellular enzymes, comprising a unique group 
of cysteine proteases. 

Only very recently, the existence of ‘classical’ cys- 
teine proteases related to papain-like cellular enzymes 
has been claimed for several positive-stranded RN.4. 
viruses. The essential Cys and His residues were iden- 
tified in the potyvirus CI [6], a-virus nsP2 [7,8] and 
murine coronavirus ‘leader’ (L-pro) ([9], and Baker, et 
al., submitted) proteases by site-directed mutagenesis. 
The relative positions of these residues in the respective 
proteins and their amino acid contexts resemble those 
of the catalytic residues in the papain-like proteases 
[6-9]. Two other putative papain-like proteases were 
revealed in the polyproteins of coronaviruses by com- 
parative sequence analysis. The putative ‘main’: pro- 
tease (M-pro) is related to MHV L-pro and is conserved 
in both IBV and MHV polyproteins [10}, and the 
putative SPL protease shares similarity to the protease 
from Streptococcus and is present in IBV polyprotein 
only [11]. 

Two proteases of picornaviruses, L-pro of FMDV 
and VPO, have not been characterized with respect to 
the type of the catalytic residues [1]. In addition, the 
presence of proteases could be suspected in several 
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other viruses (e.g. rubi-, tymo- and furoviruses) en- 
coding very large proteins [1,12]. Here we identify a 
putative protease (M-pro) encoded by the rubella virus 
genome and show that this protease and aphthovirus L- 
pro belong to the papain-like protease group. A novel 
conserved domain associated with the (putative) 
papain-like proteases of w-, corona- and rubiviruses is 
described. 


2, MATERIALS AND METHODS 


2.1. Amino acid sequences 
All sequences were from the SWISSPROT data bank Release 16, 
except for MHV [10], RuV [12], IBV [13], and BaYMV [14]. 


2.2. Comparative sequence analysis 

Amino acid sequences were compared by the program OPTAL as 
previously described [15] using the amino acid residue comparison 
matrix MDM78. Program OPTAL, implementing the Sankoff 
algorithm, generates multiple sequence alignments in a stepwise man- 
ner and calculates adjusted alignment scores as the number of stand- 
ard deviations (SD) over the mean of 25 random simulations. The 
program DotHelix [16], a module of the GENEBEE program package 
for biopolymer sequence analysis [17], was used to build up complete 
local similarity maps for pairs of amino acids sequences. 


3. RESULTS AND DISCUSSION 


The sequences of cellular and viral papain-like pro- 
teases are quite variable; the only reliable conserved 
region is a stretch of approximately 10 amino acid 
residues centering at the catalytic Cys (({18], and un- 
published observations). The sequences of positive- 
stranded RNA viral proteins which could be suspects 
for protease activity were searched for segments 
resembling this conserved stretch. The pieces of FADV 
and RuV polyproteins selected in this way were analyz- 
ed in detail. 


3.1. Aphthovirus and coronavirus leader proteases are 
related 

Pronounced similarity was found between the 
segments around the putative catalytic Cys of cor- 
onavirus proteases (particularly MHV L-pro), and a se- 
quence located near the N terminus of FMDV L-pro 
and containing a Cys residue conserved in all sequenced 
FMDV strains ({19}; Fig. 1). When the entire polypro- 
teins of MHV and FMDV (more than 6900 and 2300 
amino acids, respectively) were compared by program 
DotHelix, these segments were found to be the most 
closely related, with their alignment score being about 8 
SD above the random expectation (not shown). These 
observations allow us to predict the catalytic Cys 
residue of the L-pro of FMDV. It has been shown that 
the substitution of Ile for Thr in the vicinity of this 
residue (Fig. 1) abolished the protease activity, unlike 
four other mutations in the N-terminal half of L-pro 
[19]. Due to the weak sequence conservation around the 
catalytic His of the identified viral papain-like pro- 
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teases, all three histidines, which are conserved in the 
sequenced FMDV strains, remain, for the meantime, 
candidates for the role of the catalytic residue. 


3.2. RuV polyprotein contains a protease-like domain 
A statistically significant alignment was obtained be- 
tween a segment of the RuV non-structural polyprotein 
and cellular papain-like proteases, with sequence con- 
servation around the (putative) catalytic Cys and His 
residues (Fig. 1). The sequence similarity between the 
rubivirus polyprotein fragment (residues from 1125 to 
1320) and 12 eucaryotic proteases could be characteriz- 
ed by scores in the range between 4.5 and 12.2 SD. 
These values were obtained upon aligning the RuV se- 
quence with the cellular ones with or withowi the omis- 
sion of inserts present in some of the cellular proteases, 
respectively (not shown). This identifies the putative 
rubella virus protease and demonstrates the so far most 
pronounced similarity between cellular and viral 
papain-like proteases. It was noted, however, that not 
all of the sequence segments highly conserved in cellular 
enzymes are retained in the putative protease of RuV. 
In particular, of the six Cys residues conserved in the 
cellular proteases, only two (including the catalytic one) 
are found in the rubella virus protein, suggesting that 
the two characteristic disulfide bridges of the cellular 
proteases are not conserved in the viral counterpart. 
Amoang the viral proteases, RuV M-pro shares the most 
convincing similarity with IBV M-pro in the region 
around the putative catalytic Cys residue (Fig. 1). 


3.3. Papain-like proteases of a-viruses, and poty- and 
bymoviruses constitute a distinct group 

Our analysis revealed a previously unnoticed 
resemblance between the papain-like proteases of 
alphaviruses, on the one hand, and potyviruses and the 
closely related bymovirus, on the other hand (Fig. 1). 
The adjusted alignment score was 5.5 SD for approx- 
imately 150 amino acid residue domains of the two pro- 
tease groups. The alignment showed well-conserved 
spacing of the catalytic residues and highlighted several 
additional invariant and conserved residues 
characteristic specifically for these two enzyme groups 
(Fig. 1). A notable common feature of the a-virus and 
poty-(bymo)virus proteases is their specificity towards 
pairs of small amino acid residues [6,7]. 


3.4. A novel conserved domain associated with the 
papain-like proteases of rubi-, a- and coronaviruses 
Analysis of the sequences of viral polyproteins sur- 
rounding the putative papain-like proteases unex- 
pectedly led to the discovery of a new conserved do- 
main. This domain has been described previously as the 
most similar segment in the a-virus and rubivirus 
polyproteins [12]. Independently, a strongly conserved 
region adjacent to the putative papain-like protease(s) 
was identified upon comparison of the polyproteins of 
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Fig. 1, Alignment of the segments around the catalytic Cys and His residues of cellular and (putative) positive-stranded RNA virus papain-like 

proteases, The numbers of amino acid residues separating the aligned segments are indicated. Plus, the (putative) catalytic residues; circles, residues 

conserved in the putative protease of RuV and most of the cellular papain-like proteases; asterisks, identities between the sequences around the 

putative catalytic residues of the proteases of RuV and IBV (M-pro), FMDV and MHV (L-pro), and the groups of the proteases of a-viruses and 

poty-/bymoviruses; colons, residues partially conserved in the latter two groups; question marks, the proteases which have been identified only 

by amino acid sequence comparison; bold asterisk, the proteases which have been added to the papain-like group in this study; arrow, the residue 
which was found to be replaced in the FMDV mutant lacking the protease activity. 


the coronaviruses IBV and MHV [10]. Comparison of 
the two alignments obtained this way has suggested that 
all these domains constitute a single family (Fig. 2). 
Within this family, approximately the same level of 
similarity was observed between the sequences of all 
three groups of viruses. Screening of the Swissprot 
database provided no clue as to the possible function of 
this conserved domain (hereafter designated ‘X’ do- 
main), 


3.5. Concluding overview of viral papain-like proteases 

Together with the previously reported data on pro- 
teases of a-, corona- and potyviruses [6~11,20}, these 
observations delineated the set of (putative) positive- 
stranded RNA virus papain-like proteases. As a whole, 
the sequences of these proteases around the proposed 
catalytic Cys and His residues have relatively little in 
common, except for the notable CW(Y) dipeptide. The 
sequences around the catalytic His residues of cellular, 
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Fig. 2. Alignment of the ‘X’ domains of a-, rubi- and coronavirusss. The alignment was generated using the OPTAL program, yielding the score 
of over 6 SD for each step. Only the four most conserved segments are shown. The numbering is given for a-virus nsP3 protein and for rubi- and 
coronaviruses non-structural polyproteins. Consensus: upper case, invariant residues; lower case, residues found in at least one sequence of each 
of the three virus groups («-, rubi- and coronaviruses). The grouping of similar residues was as follows: I,L,V,M; F,Y,W; K,R; 8,T; D,E,N,Q. 
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Fig. 3. Location of the papain-like proteases and ‘X’ domains in viral polyproteins. Only polymerase (POL), helicase (HEL), chymotrypsin-related 
protease (PRO) and papain-related protease (IL-pro, M-pro, SPL) domains are indicated. The designations of specific viral proteins are shown 
where appropriate. Papain-like proteases and ‘X’ domains are highlighted by respective hatching. In the IBV polyprotein, M-pro and SPL share 
a common segment which comprises the C-terminal portion of the first of these putative proteases, and the N-terminal portion of the second one, 
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and of (putative) viral proteases are even more variable 
(Fig. 1). The lengths of the spacers separating the two 
catalytic residues in the (putative) viral proteases varied 
from quite long (i.e. comparable to the longest found in 
cell proteases) in the coronavirus enzymes to excep- 
tionally short in the a-, poty- and bymovirus proteases 
(Fig. 1). 

The location of the (putative) papain-like proteases 
and of ‘X’ domains in virus polyproteins is highly 
variable (Fig. 3). Nevertheless, certain regularities could 
be noticed, and two groups of proteases could be 
delineated, based on their roles in the processing of the 
viral polyproteins. The first group includes the pro- 
teases of poty-, bymo- and aphthoviruses. These are 
‘accessory’ leader proteases mediating a single cleavage 
event at their own C termini, while most of the 
cleavages of the respective polyproteins are effected by 
chymotrypsin-related proteases [1]. Accordingly, these 
papain-like protease domains lie outside the arrays of 
domains directly involved in genome replication and ex- 
pression, occupying the very N-terminal (FMDV L pro- 
tein and the bymovirus putative protease) or near- 
terminal (potyvirus HC protein) positions in the 
polyproteins (Fig. 3). 

The second group encompasses the proteases of a-, 
and probably of rubiviruses, which appear to be the 
‘main’, and possibly the only enzymes responsible for 
the processing of non-structural polyproteins. These 
proteases constitute parts of the arrays of the domains 
mediating viral RNA replication and expression, which 
include the RNA polymerase and the (putative) helicase 
(Fig. 3). It is interesting that the proteases of the se- 
cond, but not of the first group are associated with the 
*X’ domain; thus, it is tempting to speculate that this 
domain might be involved in the regulation of the 
polyprotein processing. 
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